Automatic Classification of Large Changes into Maintenance Categories
نویسندگان
چکیده
Large software systems undergo significant evolution during their lifespan, yet often individual changes are not well documented. In this work, we seek to automatically classify large changes into various categories of maintenance tasks — corrective, adaptive, perfective, feature addition, and non-functional improvement — using machine learning techniques. In a previous paper, we found that many commits could be classified easily and reliably based solely on the manual analysis of the commit metadata and commit messages (i.e., without reference to the source code). Our extension is the automation of classification by training Machine Learners on features extracted from the commit metadata, such as the word distribution of a commit message, commit author, and modules modified. We validated the results of the learners via 10-fold cross validation, which achieved accuracies consistently above 50%, indicating good to fair results. We found that the identity of the author of a commit provided much information about the maintenance class of a commit, almost as much as the words of the commit message. This implies that for most large commits, the Source Control System (SCS) commit messages plus the commit author identity is enough information to accurately and automatically categorize the nature of the maintenance task.
منابع مشابه
An Automatic Fingerprint Classification Algorithm
Manual fingerprint classification algorithms are very time consuming, and usually not accurate. Fast and accurate fingerprint classification is essential to each AFIS (Automatic Fingerprint Identification System). This paper investigates a fingerprint classification algorithm that reduces the complexity and costs associated with the fingerprint identification procedure. A new structural algorit...
متن کاملDevelopment of an Automatic Land Use Extraction System in Urban Areas using VHR Aerial Imagery and GIS Vector Data
Lack of detailed land use (LU) information and efficient data collection methods have made the modeling of urban systems difficult. This study aims to develop a novel hierarchical rule-based LU extraction framework using geographic vector and remotely sensed (RS) data, in order to extract detailed subzonal LU information, residential LU in this study. The LU extraction system is developed to ex...
متن کاملAn Automatic Fingerprint Classification Algorithm
Manual fingerprint classification algorithms are very time consuming, and usually not accurate. Fast and accurate fingerprint classification is essential to each AFIS (Automatic Fingerprint Identification System). This paper investigates a fingerprint classification algorithm that reduces the complexity and costs associated with the fingerprint identification procedure. A new structural algorit...
متن کاملAutomatic Maintenance of Web Directories by Mining Web Browsing Data
Received December 24, 2010 Revised May 26, 2011 Web directories allow Web users to browse a hierarchy of categories, under which different types of resources are classified. We study the problem of maintaining a Web directory, that is, the problem of continually discovering and ranking resources that are relevant to the categories of the directory. We propose an unsupervised computational metho...
متن کاملDeveloping a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature
According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...
متن کامل